A Framework for Privacy Preserving Classification in Data Mining
نویسندگان
چکیده
Nowadays organizations all over the world are dependent on mining gigantic datasets. These datasets typically contain delicate individual information, which inevitably gets exposed to different parties. Consequently privacy issues are constantly under the limelight and the public dissatisfaction may well threaten the exercise of data mining and all its benefits. It is thus of great importance to develop adequate security techniques for protecting confidentiality of individual values used for data mining. In the last 30 years several techniques have been proposed in the context of statistical databases. It was noticed early on that non-careful noise addition introduces biases to statistical parameters, including means, variances and covariances, and sophisticated techniques that avoid biases were developed. However, when these techniques are applied in the context of data mining, they do not appear to be bias-free. Wilson and Rosen (2002) suggest the existence of Type Data Mining (DM) bias that relates to the loss of underlying patters in the database and cannot be eliminated by preserving simple statistical parameters. In this paper we propose a noise addition framework specifically tailored towards the classification task in data mining. It builds upon some previous techniques that introduce noise to the class and the so-called innocent attributes. Our framework extends these techniques to the influential attributes; additionally, it caters for the preservation of the variances and covariances, along with patterns, thus making the perturbed dataset useful for both statistical and data mining purposes. Our preliminary experimental results indicate that data patterns are highly preserved suggesting the non-existence of DM bias.
منابع مشابه
A centralized privacy-preserving framework for online social networks
There are some critical privacy concerns in the current online social networks (OSNs). Users' information is disclosed to different entities that they were not supposed to access. Furthermore, the notion of friendship is inadequate in OSNs since the degree of social relationships between users dynamically changes over the time. Additionally, users may define similar privacy settings for their f...
متن کاملClassification and Evaluation the Privacy Preserving Data Mining Techniques by using a Data Modification-based Framework
In recent years, the data mining techniques have met a serious challenge due to the increased concerning and worries of the privacy, that is, protecting the privacy of the critical and sensitive data. Different techniques and algorithms have been already presented for Privacy Preserving data mining, which could be classified in three common approaches: Data modification approach, Data sanitizat...
متن کاملClassificationandevaluation the Privacy Preserving Distributed Data Miningtechniques
In recent years, the data mining techniques invarious areas have met serious challenges increasingconcernsaboutprivacy. Different techniques and algorithms have been already presented for Privacy preserving data mining (PPDM), which could be classified in two scenarios: centralized data scenario and distributed data scenario. This paper presents a Framework for classification and evaluation of ...
متن کاملPrivacy Preserving Data Mining Techniques: Challenges & Issues
Privacy preserving becomes an important issue in the development of various data mining techni'ques. In this paper, we have discussed various techniques to preserve privacy while mining data. In the absence of uniform framework across all data mining techniques, researchers have focused on data technique specific privacy preserving issue. Available framework and algorithms provide further insig...
متن کاملMining Multiple Private Databases using a Privacy Preserving kNN Classifier
Data mining technologies are popular for identifying interesting patterns and trends in large amounts of data. With the advent of high speed networks and easily available storage, many organizations are able to collect large amounts of data. On one hand, these organizations would like to mine their data to understand and discover interesting patterns; on the other hand, many legal and commercia...
متن کاملMining Multiple Private Databases using a Privacy Preserving kNN Classifier
Data mining technologies are popular for identifying interesting patterns and trends in large amounts of data. With the advent of high speed networks and easily available storage, many organizations are able to collect large amounts of data. On one hand, these organizations would like to mine their data to understand and discover interesting patterns; on the other hand, many legal and commercia...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004